-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Crop Type Mapping tutorial #2449
base: main
Are you sure you want to change the base?
Conversation
@burakekim Can you add the tutorial to the table of content in the rst file? Then one can view it in the CI docs to review as well :) |
The initial and somewhat end-to-end draft is now out:
There are quite a few things I want to correct and improve:
P.S. In the next iteration, I am thinking of renaming the case study to |
Co-authored-by: Adam J. Stewart <[email protected]>
Still need to actually look at the code, but here are responses to your TODOs:
Note that this needs to run in CI, preferably in seconds, not days. We can monkeypatch certain hyperparams to make this faster, but it shouldn't require a GPU.
Happy to do this if it makes the above faster while still getting good results.
Avoid big data, this needs to run in CI where we have very limited storage, don't want to wait 10 min to download data during a tutorial. Not sure what you mean by opportunistic sampling, but there are various GeoDataset splitting methods that you can use to chop a tile into east/west splits, grids, etc.
I think both add value. Basically, we should have some kind of binary semantic segmentation application, and some kind of multiclass semantic segmentation application. They don't both have to be for agriculture though. For binary, something like building mapping may make more sense. Also, tasks involving agriculture benefit greatly from time-series data. I'm planning on extending this tutorial for time series once we add support for it. So don't worry too much about the details right now, they will change in the future. This will also make the big data problem even worse, so keep the images small for now.
Would prefer to avoid any additional dependencies if we can. Any reason we can't plot a static map with matplotlib?
I agree with the rename. Both "Crop Classification" and "Crop Type Mapping" are common names. I think the latter may actually be even more common, and more technically correct. A computer vision person may argue that this is semantic segmentation, not classification. Of course, semantic segmentation is just pixelwise classification, so the distinction isn't too important. |
Re: Using a dynamic map of overlaid mask and Sentinel-2 AOIs: I think it just looks nice, which is not a legit reason to keep it. I will discard it later Re: Naming: Changed to Re: CPU-friendly pipeline: With the current setup, each epoch takes ~8 minutes. I was targeting a training+test of 30mins on my laptop, which gives me a budget of 3 epochs. However, this way, the prediction does not look pretty at all for 10-class classification. We can do one or both: increase the tutorial time budget or lower the number of classes from 10 to, say, 3 (three most occuring classes in our AOI). What are your priorities? Re: Uploading the weights and Sentinel-2 patch on HF: This does not seem to offer a great deal of convenience unless we have a particular focus on allowing users to have the pretrained weights and Sentinel-2 patch and skip straight to inference -- which I do not think is the case because that bypasses the end-to-end showcase of TG abilities, which is the key feature of this tutorial Re: Multi-class setting: I first listed all the crop type classes that fall under our AOI, got the 10 crop type classes with the highest occurrence, and set that as the Thinking about it, we could consider a method for the users to choose the top-N crop types based on their occurrences for their AOIs. Otherwise, for larger AOIs (ours is fairly small and initially had 30 classes and for example, 239 for Estonia), the number of crop types might make the task challenging for those with specific experimental needs (we do not likely need 239-class classification task). This would be another PR though |
Change this from minutes to seconds and then this might get merged. Imagine presenting a tutorial in person and clicking "run" and telling people to just wait 30 min and it will be ready. It isn't hard to hack this to be fast in CI, but it should also be fast in person as well. This is where pre-trained models can help. Exact number of crop types that is appropriate depends on the region, but 10 sounds okay to me. |
The current setup includes pre-trained model and still takes 8mins/epoch. I can make the AOI much smaller but might make the prediction look less pretty and force us to use less classes. Do you have any tricks in mind to go from minutes to seconds? |
How large is the current AOI? If we use a pre-trained decoder, it doesn't matter how small it is, the predictions will still be good. For CI, we can use |
It is 1640x1531 pixels. I use the RGB bands of Sentinel-2 with the This is the setup I thought would be the fastest, but it is still slow, likely due to one or both of the following reasons:
|
Okay, that's not that big, we should be able to make this work.
I don't think we should freeze the decoder (unless we have a pretrained decoder), even if it makes things faster. I don't want to teach people something useless.
Can you try on Google Colab/Lightning Studios? Is that any better? That's the expected run environment, so as long as it is fast enough there, I'm happy. |
https://colab.research.google.com/drive/119g5jC1WxxeTuYLHcCECpYENx6UpIgHF?usp=sharing Yep, it was my laptop. On Colab it takes <1 minute each epoch. I am working on the hyperparameters to get an acceptable prediction |
Adding a new tutorial according to #2418
This tutorial demonstrates how to combine Sentinel-2 and
CDLEuroCrops datasets using theSentinel2CDLDataModuleSentinel2EuroCropsDataModule
. It covers training a semantic segmentation model, along with evaluation and inference steps.